Analyzing Urdu Social Media for Sentiments using Transfer Learning with Controlled Translations

نویسندگان

  • Smruthi Mukund
  • Rohini Srihari
چکیده

The main aim of this work is to perform sentiment analysis on Urdu blog data. We use the method of structural correspondence learning (SCL) to transfer sentiment analysis learning from Urdu newswire data to Urdu blog data. The pivots needed to transfer learning from newswire domain to blog domain is not trivial as Urdu blog data, unlike newswire data is written in Latin script and exhibits codemixing and code-switching behavior. We consider two oracles to generate the pivots. 1. Transliteration oracle, to accommodate script variation and spelling variation and 2. Translation oracle, to accommodate code-switching and code-mixing behavior. In order to identify strong candidates for translation, we propose a novel part-of-speech tagging method that helps select words based on POS categories that strongly reflect code-mixing behavior. We validate our approach against a supervised learning method and show that the performance of our proposed approach is

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Review on Urdu Language Parsing

-Natural Language Processing is the multidisciplinary area of Artificial Intelligence, Machine Learning and Computational Linguistic for processing human language automatically. It involves understanding and processing of human language. The way through which we share our contents or feelings have always great importance in understanding and processing of language. Parsing is the most suited ap...

متن کامل

Using Machine Learning Algorithms for Automatic Cyber Bullying Detection in Arabic Social Media

Social media allows people interact to express their thoughts or feelings about different subjects. However, some of users may write offensive twits to other via social media which known as cyber bullying. Successful prevention depends on automatically detecting malicious messages. Automatic detection of bullying in the text of social media by analyzing the text "twits" via one of the machine l...

متن کامل

Analysis of Sentiments in Corporate Twitter Communication – A Case Study on an Issue of Toyota

Knowing about communication of specific issues in social media has become increasingly important for the reactive and proactive stakeholder-communication of enterprises. Tools have been designed to monitor social media sites and to aggregate data of discussions in social media. However, these tools do not consider the dynamics of discussions and are not able to reflect sentiments within these d...

متن کامل

Automatic Learning of Morphological Variations for Handling Out-of-Vocabulary Terms in Urdu-English Machine Translation

We present an approach for online handling of Out-of-Vocabulary (OOV) terms in UrduEnglish MT. Since Urdu is morphologically richer than English, we expect a large portion of the OOV terms to be Urdu morphological variations that are irrelevant to English. We describe an approach to automatically learn English-irrelevant (targetirrelevant) Urdu (source) morphological variation rules from standa...

متن کامل

Weibo sentiments and stock return: A time-frequency view

This study provides new insights into the relationships between social media sentiments and the stock market in China. Based on machine learning, we classify microblogs posted on Sina Weibo, a Twitter's variant in China into five detailed sentiments of anger, disgust, fear, joy, and sadness. Using wavelet analysis, we find close positive linkages between sentiments and the stock return, which h...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012